Indexing text with approximate q-grams

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Text with Approximate q-Grams

We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the inde...

متن کامل

Approximate String Matching with Ordered q-Grams

Approximate string matching with k differences is considered. Filtration of the text is a widely adopted technique to reduce the text area processed by dynamic programming. We present sublinear filtration algorithms based on the locations of q-grams in the pattern. Samples of q-grams are drawn from the text at fixed periods, and only if consecutive samples appear in the pattern approximately in...

متن کامل

Indexing DNA Sequences Using q-Grams

We have observed in recent years a growing interest in similarity search on large collections of biological sequences. Contributing to the interest, this paper presents a method for indexing the DNA sequences efficiently based on q-grams to facilitate similarity search in a DNA database and sidestep the need for linear scan of the entire database. Two level index – hash table and c-trees – are ...

متن کامل

Using q-grams in a DBMS for Approximate String Processing

String data is ubiquitous, and its management has taken on particular importance in the past few years. Approximate queries are very important on string data. This is due, for example, to the prevalence of typographical errors in data, and multiple conventions for recording attributes such as name and address. Commercial databases do not support approximate string queries directly, and it is a ...

متن کامل

Tuning BNDM with q-Grams

We develop bit-parallel algorithms for exact string matching. Our algorithms are variations of the BNDM and Shift-Or algorithms. At each alignment the algorithms read a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our experiments show that many of the new variations are substantially faster than any previous string matching algorithm on x86...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Discrete Algorithms

سال: 2005

ISSN: 1570-8667

DOI: 10.1016/j.jda.2004.08.003